Transcriptome variations in hybrids of wild emmer wheat (Triticum turgidum ssp. dicoccoides)

Background Wild emmer wheat is a great candidate to revitalize domesticated wheat genetic diversity. Recent years have seen intensive investigation into the evolution and domestication of wild emmer wheat, including whole-genome DNA and transcriptome sequencing. However, the impact of intraspecific hybridization on the transcriptome of wild emmer wheat has been poorly studied. In this study, we assessed changes in methylation patterns and transcriptomic variations in two accessions of wild emmer wheat collected from two marginal populations, Mt. Hermon and Mt. Amasa, and in their stable F4 hybrid. Results Methylation-Sensitive Amplified Polymorphism (MSAP) detected significant cytosine demethylation in F4 hybrids vs. parental lines, suggesting potential transcriptome variation. After a detailed analysis, we examined nine RNA-Seq samples, which included three biological replicates from the F4 hybrid and its parental lines. RNA-Seq databases contained approximately 200 million reads, with each library consisting of 15 to 25 million reads. There are a total of 62,490 well-annotated genes in these databases, with 6,602 genes showing differential expression between F4 hybrid and parental lines Mt. Hermon and Mt. Amasa. The differentially expressed genes were classified into four main categories based on their expression patterns. Gene ontology (GO) analysis revealed that differentially expressed genes are associated with DNA/RNA metabolism, photosynthesis, stress response, phosphorylation and developmental processes. Conclusion This study highlights the significant transcriptomic changes resulting from intraspecific hybridization within natural plant populations, which might aid the nascent hybrid in adapting to various environmental conditions. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05258-3.


Background
Wheat is one of the most important crops for human consumption, playing a significant role in global food production [1].Ensuring a higher yield of wheat is crucial to maintaining global food security [2].Global wheat production in 2021 reached 777 million tons according to the Food and Agriculture Organization of the United Nations (http://www.fao.org/faostat/en/#home).Modern agriculture prioritizes pure breeding, leading to a narrow gene pool among wheat species despite high yield [3][4][5].The loss of genetic diversity in wheat has increased vulnerability to diseases, pests, and climate change, posing a threat to communities and countries [4][5][6][7].Without adaptation to climate change, crop production is predicted to decrease with a 2 °C temperature increase from late 20th-century levels, according to the IPCC [8].During certain wheat growth stages, drought can cause up to a 92% loss in yield [9].Moreover, certain diseases like Ug99 (a highly virulent race of stem rust fungus), a fungal infection that caused severe damage to wheat cultivation in Africa and the Middle East, pose a significant danger to global food security, and adversely affect wheat growth, leading to a massive impact on food security and human health all around the world.
A potential source to revive genetic diversity in domesticated wheat is wild emmer wheat (Triticum turgidum ssp.dicoccoides), one of the progenitors of modern wheat.The genetic diversity of wild emmer wheat has been preserved, allowing it to remain adaptive to various environmental and biotic stressors [3,10,11].The high genetic diversity of wild emmer wheat makes it a crucial subject for crop research, as it provides highly sought-after traits and mechanisms for environmental adaptation and stress resistance.
In Israel, the wild emmer's habitat spans from Mt. Hermon to Mt. Amasa in the Judean Desert [12,13].Although it is geographically small, this region boasts a diverse range of habitats, each with its unique ecological conditions.These habitats vary in elevation, ranging from 200 m below sea level in the Jordan Valley to 1600 m above sea level at Mt. Hermon.Additionally, they differ in soil types, temperatures, and other biotic and abiotic conditions [12,14].Previous studies have primarily focused on the genetic diversity of wild emmer wheat populations within Israel, at both macro and micro scales, with findings suggesting correlations between genetic diversity, ecological traits, and geographical location [10,11,13,[15][16][17].The distribution of wild emmer wheat populations in Israel was influenced by population position (core or marginal) and size [13].Additionally, studies on Israeli emmer populations have revealed non-random genetic differentiation linked to soil, topography, and climate at both single and multi-locus levels [10].Finally, the study by Venetsky et al., (2015) detected population-specific methylation patterns in the whole genome and around transposable elements in five geographically isolated populations in Israel.Furthermore, these population-specific methylation patterns were heritable and passed down from the first to the second generation.Together with the phenotypic plasticity of wild emmer wheat and its population spread across a wide range of habitats in Israel, these findings suggest underlying adaptive genetic and epigenetic mechanisms.
Whole-genome sequencing breakthroughs have greatly improved the sequencing of wheat genomes.The draft sequence of wild emmer wheat [18] provided an opportunity to investigate and assess transcriptomic changes within natural populations, revealing functional adaptations to different environments.For example, Yin et al. [19]., identified and isolated a single dominant powdery mildew resistance gene from the wild emmer wheat natural population found at Mt. Carmel in Israel, showing the possibilities and opportunities for domesticated wheat improvement can be found in the wild emmer wheat.
In this study, we aimed to explore changes in methylation patterns and transcriptomic variations in wild emmer wheat accessions and their hybrid.We selected parental lines from marginal Israeli wild emmer wheat populations -Mt.Hermon and Mt.Amasa -and their stabilized F4 hybrid.To determine the changes in epigenomic patterns, we utilized MSAP (Methylation-Sensitive Amplified Polymorphism), which is a modified version of the typical AFLP, to estimate methylation patterns on a genome-wide scale.For analyzing transcriptomic variations, we conducted RNA-Seq analysis.We then characterized the differentially expressed genes between the parental lines and their F4 hybrid at the molecular level.Additionally, we analyzed and discussed the functions and biological pathways of the differentially expressed genes.

Phenotypic assessments in hybrids vs. parental lines
Plant height, seed yield, and spike number were analyzed in this study in the parental lines and the F4 hybrid.The F4 hybrid had intermediate seed yield and spike number but was significantly taller than the parental lines (Fig. S1, Additional file 2).The plant height was measured once a month for three months.Measurements started around Feekes 3.0-4.0level when at least 3-5 tillers were visible.Height was measured from ground to edge of the highest tiller or edge of the canopy, excluding spikes.Hight measurements of F4 were significantly higher than all the parental lines.In addition, chlorophyll content was measured in fresh leaf tissues.This analysis used nine leaf samples from parental groups and an F4 hybrid (3 biological replicates from each plant).Total chlorophyll content averaged at 0. 1 ± 0.006 in Mt.Amasa, 0.1 ± 0.01 in Mt.
Hermon, and 0.11 ± 0.01 in F4 hybrid (Fig. S2, Additional file 2).To this end, no significant differences were seen between parental lines and the F4 hybrid.

Cytosine methylation status in parental lines and their hybrid
Cytosine methylation patterns were examined to determine whether significant epigenetic alterations exist between hybrids and parental lines.We utilized MSAP to determine the cytosine methylation status at CCGG sites.Our study involved the analysis of twelve plants to compare the methylation differences between parental lines and F4 hybrid.For each group of the the parental lines, Mt.Hermon and Mt.Amasa, and the F4 hybrids, we conducted four biological replicates,.To this end, we conducted three MSAP reactions using different selective primer combinations of EcoRI/MspI and EcoRI/HpaII (primer sequences are listed in Table S1, Additional file 1).
In each MSAP reaction, we analyzed 200-300 peaks comprising 740 CCGG sites (265 bands in primer pair E-ACA/HM-TCAA, 200 bands in primer pair E-ACT/ HM-TCAA, and 275 bands in primer pair E-ACC/HM-TCAA).Methylation levels were calculated by dividing the number of polymorphic bands between treatments (HpaII and MspI) by the total band count.The presence of a monomorphic MSAP site in both MspI and HpaII patterns indicates an unmethylated CCGG site.Polymorphic bands are observed when a band appears in the MspI pattern, but not in the HpaII pattern.This indicates that the internal cytosine in that particular CCGG site is methylated.Conversely, when a band appears in the HpaII pattern, but not in the MspI pattern, it indicates that the external cytosine in the same CCGG site is methylated in one of the DNA strands.This is known as hemi-methylation status.Note that methylation of both cytosines at the CCGG site prevents PCR amplification in both MspI and HpaII patterns.Using HpaII patterns, we observed that global methylation patterns in hybrids were intermediate between both parental lines (see supplemental Fig. S3, Additional file 2 ) The Hermon parental line had an average cytosine methylation level of ∼ 85.6%, while the Amasa parental line had an average methylation level of ∼ 82.4% (see Table S2, Additional file 1).The methylation level in F4 hybrids was approximately ∼ 74.5%, suggesting demethylation of CCGG sites in hybrids compared to parents.This may be associated with higher genome expression.
Alterations in DNA methylation patterns at specific CCGG sites were identified by analyzing the difference in band composition between parental lines and hybrids.To illustrate, a polymorphic band that was present only in the parental lines and not in F4 hybrids was considered a demethylation event in the F4 hybrids.Conversely, a polymorphic band found only in F4 hybrids indicated a hypermethylation event (see supplemental Fig. S4, Additional file 2 for an example of such alteration).Detailed analysis showed demethylation of 50 CCGG sites and hypermethylation of 26 sites in the F4 hybrid, indicating a significant decrease in methylation.

Transcriptome analysis in the parental lines and their F4 hybrid
To evaluate transcriptome changes in wild emmer wheat hybrids and their parental lines, we performed RNA-Seq analysis on an F4 hybrid and its parents.

Library quality
Leaf total RNA samples collected from the parental lines, Mt.Hermon (MH) and Mt.Amasa (MA), as well as their F4 hybrid, were fully sequenced.To obtain comprehensive coverage and perform accurate statistical analysis, we sequenced three biological replicates of the leaves simultaneously.As part of this effort, we collected RNA-Seq data from 9 libraries (3 from each of the two parents and 3 from the F4 hybrid), with each library containing between 15 and 25 million reads, resulting in a total of approximately 200 million reads.Alignment of the RNA-Seq data to the reference RNA-seq data of WEWSeq 1.0 acc.Zavitan resulted in a mapping rate of approximately 82% for each of the three libraries.A total of 62,490 genes were identified in the parental and hybrid libraries, with ∼ 38,000 genes in common.

Variations in gene expression patterns
The study used Principal Component Analysis (PCA) to compare the gene expression patterns between the parental lines and their F4 hybrid.The analysis was conducted on normalized gene counts of more than 38,000 genes identified through RNA-Seq.The results showed that the first and second components (PC1 and PC2) explained 63.38% and 21.53% of the total variation in the transcriptome (Fig. 1).PCA analysis separates the parental lines and F4 hybrid (Fig. 1).F4 hybrid was positioned between the parental lines on the PC1 axis, but appeared less similar to them on the PC2 axis (Fig. 1).It was observed that the main differences within the groups were found in F4 and Mt.Amasa.On the PC1 and PC2 axes, Mt.Amasa showed the most significant variation within the group, which was more pronounced on the PC1 axis.On the other hand, F4 exhibited relatively high variation within the group over the PC2 axis.This within-group variation was also noticed when differentially expressed genes were analyzed.In addition, the heatmap (Fig. 2) displaying the total transcriptome expression value per sample supported the PCA analysis.The heatmap suggests that the F4 hybrids are positioned between the parental lines and are slightly closer to Mt. Hermon.Moreover, it highlights the within-group variation observed in both F4 and Mt.Amasa to a greater extent.

Differentially expressed genes between parental lines vs. F4 hybrid
Differential expression analysis was performed using DESeq2.It is important to note that significant differential expression levels between groups were considered to be ∼ 2-fold or more.In total, 6602 genes showed differential expression between the parental lines and the F4 hybrid.Out of these, 3679 genes showed differential expression between the Mt.Hermon line and the F4 hybrid, while 2311 genes showed differential expression between the Mt.Amasa line and the F4 hybrid.Additionally, 612 genes demonstrated similar expression levels in both parental lines, while different expression levels were observed in the F4 hybrid.
ClusterGap analysis [20] of 6602 differentially expressed genes in parental lines vs. F4 hybrids revealed eight patterns of clusters (Fig. 3).Each cluster represents a group of genes with similar expression patterns in the parental lines and F4 hybrid (Fig. 3).Overall, the eight clusters can be classified into four expression pattern categories: (1) Clusters 1, 3, 4, and 5 consists of 3576 genes that show under-expression in one parental line vs.F4 hybrid (Figs. 3 and 4a).( 2) Clusters 2 and 8 consists of 2414 genes overexpressed in one parental line vs.F4 hybrid (Figs. 3 and 4c).(3) Cluster 6 consists of 362 genes under-expressed in both parental lines vs. F4 hybrid (similar expressions in both parents) (Figs. 3 and 4b).(4) Cluster 7 consists of 250 genes overexpressed in both parental lines vs. F4 hybrid (similar expressions in both parents) (Figs. 3 and 4d).

SNPs detection between parental lines vs. hybrids
We used the GATK variant calling to identify differences in Single Nucleotide Polymorphisms (SNPs) between the parental lines and their hybrid.After filtering, we obtained 13,170 high-quality SNPs (HQ SNPs) with an overall GQ > 15 across all samples.Out of these, 5,818 SNPs were found in the Mt.Amasa parental line and 7,291 SNPs were found in the Mt.Heromn parental line.Due to the small sample size, we only considered SNPs that matched all three samples per group.In total, we found 1,403 SNPs in 477 differentially expressed genes.
The clusters 1, 4, 5 of Mt.Hermon showed overexpression, while cluster 2 showed underexpression.The largest SNP groups, consisting of 641 and 312 SNPs, were associated with over and under-expressed genes respectively.These SNPs were linked to 273 and 113 genes for over and under-expressed genes, respectively.In Mt.Amasa, clusters 8 and 3 displayed over and under-expressed Fig. 1 PCA plot of F4 hybrids and Mt.Amasa (MA) and Mt.Hermon (MH) parental lines.PCA was generated from over 38,000 genes identified through RNA-Seq.PC1 explains 64% of the variance, separates between the groups F4 and parental lines.PC2 explains that 20% of the variance separates between parental lines and F4 hybrid genes.The SNPs linked with these genes numbered 194 and 151, respectively, and were associated with 73 and 57 genes.F4 had clusters 6 and 7, with upregulated and downregulated genes respectively.These clusters had 38 and 67 SNPs each, linked with 18 genes in each group.SNPeff was used to assign SNPs to their respective genes and then rank each SNP based on its possible effect on those genes.Out of the total SNPs, 642 had a predicted low impact on genes (synonymous SNPs), 303 SNPs had a moderate impact on genes (non-synonymous SNPs), 6 SNPs had a high effect on genes (premature stop codon), and 452 SNPs were predicted to be modifier SNPs (SNPs found in UTR).
After that, we examined whether the SNPs matched the expression pattern revealed by the differential expression analysis.We considered SNPs matched SNPs only if they were foundin line with expression patterns.Specifically, SNPs in genes overexpressed in Mt.Hermon (clusters 1, 4, and 5) were only found in Mt.Hermon samples but not in F4 and Mt.Amasa samples, or only in F4 and Mt.Amasa but not in Mt.Hermon.To this end, out of the 1403 SNPs that were observed, 1024 (73%) matched with the expression patterns.Mt.Hermon over and under-expressed clusters had 20-30% unmatched SNPs (clusters 1,4,5, and 2, respectively), while Mt.Amasa over and under-expressed clusters (3 and 8, respectively) had less than 1% unmatched SNPs.The F4 over and underexpressed clusters (6 and 7, respectively) had approximately 90% unmatched SNPs (Fig. S5, Additional file 2).

Gene annotations and functional analysis
We used the gene annotation data, publicly available for WEWSeq 1.0 acc.Zavitan genome, as a reference to analyze the differentially expressed genes in both parental lines and F4 hybrid.To understand the gene functions and biological pathways, we performed a Gene Ontology assessment using SEA analysis in agriGO v2.0 [21].To this end, 2680 genes were annotated and significantly classified into; biological processes (BP), cellular components (CC), and molecular function (MF) pathways (Fig. S6, Additional file 2).Overexpressed genes in Mt.Amasa parents (cluster 8) were associated with 25 BP GO terms, while under-expressed genes (cluster 3) were associated with 2 BP, 29 MF, and 7 CC GO terms (Fig. S6, Additional file 2).Overexpressed genes in Mt.Hermon parents (clusters 1,4 and 5) were associated with 152 BP, 73 MF, and 7 CC terms, while under-expressed genes (cluster 2) were associated with 6 BP and 1MF terms (Fig. S6, Additional file 2).Overexpressed genes in the F4 hybrid (cluster 6) were associated with 4 BP, 32MF, and 3 CC terms (Fig. S6, Additional file 2).To visualize the difference in specific GO terms between parental lines and F4 hybrid genotypes, we used over and under-expressed enriched GO categories from clusters 1,2,3,4,5, 6, and 8 to run a REVIGO analysis, which summarizes GO results by removing redundant terms and providing graph-based visualization [21].
In the molecular function class (MF), significant GO terms associated with under-expressed genes in one parental line vs.F4 hybrid (GO terms were found only in clusters 1, 3, and 5) were annotated as phosphotransferase activity, alcohol group as acceptor (GO: 0016773, 244 genes), purine nucleoside binding (GO:0001883, 481 genes) (Fig. 5d, Fig. S8a, b, Additional file 2).Overexpressed genes in the F4 hybrid (cluster 6) were annotated as electron transporter, transferring electrons within Fig. 3 Clustered differentially expressed gene expression by samples.Differential expression analysis revealed a total of 6602 differentially expressed genes among samples.K-mean analysis was performed on a Z-score scaled rlog transformed expression data.Each line represents the mean expression pattern of samples per cluster.Y-axis represents the Z-score scaled rlog transformed expression level.Sample names are on the X-axis.MH -Mt.Hermon samples, MA -Mt.Amasa samples the cyclic electron transport pathway of photosynthesis activity (GO:0045156, 6 genes) and chlorophyll-binding (GO:0016168, 8 genes) (Fig. 4c).
In the cellular component class (CC), GO terms were found only in genes associated with under-expressed genes in one parental line vs.F4 hybrids (GO terms were found only in clusters 1 and 3) and overeexpressed genes in F4 hybrid (cluster 6).Genes underexpressed in one parental line were annotated as photosystem I (GO:0009522, 15 genes) and membrane part (GO:0044425, 199 genes), and protein-DNA complex (GO:0032993, 21 genes) (Fig. S9 a, b, Additional file Fig. 4 Examples of expression patterns of the four cluster categories.Y-axis notes normalized average counts for each group taken from DESeq2. A. Gene TRIDC1AG025480 encodes to a protein associated with plant basic secretory protein (BSP) family related with defense response.This gene belongs to cluster 1, under-expressed genes in Mt.Hermon compared to F4. B. Gene TRIDC2BG018130 encodes to a protein associated with photosystem II.This gene belongs to cluster 6, under-expressed in both parental lines compared to F4. (C) Gene TRIDC1AG036170 is an orthologue of genes encoding to Annexin superfamily protein associated with stress response.This gene belongs to cluster 8, Overexpressed in Mt.Amasa compared to F4. (D) Gene TRIDC2AG021310 encodes to a protein associated with sucrose synthase activity.This gene belongs to cluster 7, overexpressed in both parental lines compared to F4 2).Under-expressed genes in the F4 hybrid were annotated as photosystem II (GO:0009523, 8 genes).Note that a detailed list of all annotated genes can be found in Additional file 3 and Fig. S5, Additional file 2.

Discussion
Wheat production has hit breaking record yields thanks to the implementation of modern agricultural and pure lines breeding programs.However, growing worldwide food demand requires constant improvement and increases in wheat production [22].An important factor hindering the attempt to increase wheat yield is the narrow and homogeneous gene pool among domesticated wheat species, making wheat more susceptible to diseases, pests, and climate change [3][4][5]22].In the past few decades, scientists have looked to the wild ancestors of wheat, like wild emmer wheat, as potential sources for rejuvenating the genetic diversity of cultivated wheat.The significant potential of wild emmer wheat in providing crucial genes for enhanced yield, adaptability to various environmental conditions, and resilience against both biotic and abiotic stressors has gained widespread recognition.In addition, the recent breakthroughs in the whole-genome sequencing [6] and the whole-genome assembly of the wild emmer wheat genome [18] have provided an opportunity to investigate and assess the In this novel research, changes in methylation patterns and transcriptomes were analyzed in two distinct natural populations of wild emmer wheat-Mt.Hermon and Mt.Amasa-as well as in their genetically stable F4 hybrid.These populations represent the cold and hot edge spread of wild emmer wheat populations in Israel and face different environmental challenges.Understanding the underlying mechanism of these populations' specific adaptations has been an essential goal in wheat research throughout the years [4,5,11,23].Our objective was to investigate alterations in methylome and transcriptome levels, as well as identify differentially expressed genes and pathways.This analysis aimed to provide deeper insights into how functional variation contributes to the adaptation of wild emmer wheat to diverse habitats.We observed a notable reduction in methylation levels in the F4 hybrids when compared to the parental lines.Furthermore, we pinpointed 6602 genes that exhibited differential expression between Mt.Hermon, Mt.Amasa, and their F4 hybrid.Gene ontology analysis associated these genes with photosynthesis, defense response, and phosphorylation.

DNA methylation status in parental lines vs. hybrids
Reduced methylation levels, known as demethylation, have been linked to various factors including responses to environmental stresses, defense mechanisms, activation of transposable elements, and gene expression [24,25].The presence of environmental stressors can disrupt the normal growth and development of plants, prompting adjustments in genomic DNA methylation levels as a responsive mechanism to counteract the stress [26,27].In rice (Oryza sativa), it has been reported that drought stress triggers widespread alterations in the level of DNA methylation across the genome [27].In maize, methylation levels were observed to decrease by 1.0-2.2%under cold stress [26].A reduction in methylation can also take place following a 'genomic shock' , as seen in events like hybridization or allopolyploidization [28].As an illustration, in maize, hybrids derived from two inbred lines exhibited decreased methylation levels in comparison to their corresponding inbred counterparts, with the highest degree of demethylation observed in the hybrids [29].Similarly, in this study, the methylation levels in the F4 hybrids were found to be lower compared to the parental lines.Upon our evaluation of the overall methylation changes at CCGG sites, we observed a greater number of demethylation events in the F4 hybrid compared to hypermethylation events.Research conducted on soybeans has shown that the greater the reduction in methylation levels, the more pronounced its impact on gene expression levels [30].In this study, we observed a reduction in methylation levels, leading to a higher occurrence of demethylation events in the F4 hybrids when compared to their parental lines.These results suggest a potential elevation in gene expression.Nevertheless, it's important to note that we did not directly assess the impact of methylation decrease on gene expression in this study.

Transcriptome variation
Through a comprehensive examination of genetic diversity and a differential expression analysis, our objective was to discern the influence of each parental lineage on hybrid gene expression.Various models have been proposed to anticipate this effect.The additive genetic model posits that gene expression is equally inherited from both parents, resulting in intermediate expression levels in the hybrid.On the other hand, the dominance genetic model proposes that expression levels are determined by specific alleles, resulting in similarity to one of the parental lineages [31][32][33].Additional models, such as the parental effect model, propose that gene expression in hybrids is governed by parental mechanisms like maternal provisioning or genomic imprinting.Consequently, it resembles the gene expression pattern of either the maternal or paternal lineage [32,34].Despite our main focus not being the determination of the most fitting model for this case study, it is apparent that the overall genetic diversity of the F4 hybrid lies between that of both parental groups, signifying an intermediate state (Fig. 1).In our examination of differential expression, we found that both the Mt.Amasa group and F4 exhibited a higher count of genes displaying notable expression variations, encompassing both overexpression and underexpression, as opposed to the Mt.Hermon group.Notably, the most prominent cluster of differentially expressed genes was the underexpressed category in Mt.Hermon, which comprised three distinct clusters (1,4 and 5, Fig. 4).This set of 2,433 genes accounts for 37% of the differentially expressed genes.These findings align more closely with the additive or parental effect models rather than the overdominance model.This is because the hybrid exhibited a relatively modest number of differentially expressed genes when compared to both parental lines (clusters 6 and 7).While we may not have definitively identified the most fitting model, it is evident that while the F4 hybrid overall genetic diversity was intermediate between both parental groups (Figs. 1 and 2), its differentially expressed genes displayed a distinct expression bias towards the paternal line, Mt.Amasa.

SNPs in differentially expressed genes
Employing SNPs (Single Nucleotide Polymorphisms) is a crucial methodology in genetics, breeding, as well as ecological and evolutionary research.SNPs serve to create genetic markers, uncover cis-regulatory variations, and pinpoint genes linked to particular genetic traits.The utilization of RNA-seq data for SNP identification, although a relatively recent approach, offers several advantages.Firstly, it enables the identification of thousands of SNPs that are correlated with the expression levels of functional genes.Secondly, these SNPs can be linked to biological traits and used to evaluate phenotypes based on genotype.Finally, this method proves to be cost-effective in comparison to other approaches [35,36].
However, it's important to note that when utilizing RNA-seq data for variant calling and SNP identification, researchers should be mindful of the potential for false positives and the necessity for high-quality sequencing.The precision of SNP discovery is significantly influenced by factors such as pair-end/single-end sequencing and the length of reads [36].
To mitigate these challenges, we implemented stringent and cautious filtering protocols in this study.Additionally, our analysis was grounded in the well-established relationships between parental groups and hybrids, further bolstering the validity of each identified SNP.Moreover, we consider the identification of SNPs within differentially expressed genes as a preliminary indication, providing a solid foundation for subsequent research and identification efforts.We aimed to determine if SNPs within these genes exhibit a correlation with expression patterns.Our results demonstrated a notable correlation between the expression patterns and the SNPs identified in differentially expressed genes specific to one parental line (clusters 1, 2, 3, 4, 5, and 8).Notably, SNPs in the F4 hybrids consistently originated from one parental line, and we did not observe any entirely new SNPs in the F4 hybrids.

Hybridization effects
In natural populations, hybridization constitutes a significant evolutionary mechanism capable of augmenting populations.It achieves this by introducing adaptive traits, generating novel lineages, bolstering overall genetic diversity, and engendering heightened phenotypic traits a phenomenon commonly referred to as heterosis or hybrid vigor [37,38].Although hybrid vigor is known to be greatly reduced after the first generation, studies have shown that DNA methylation changes associated with small interfering RNAs in hybrids can influence gene expression patterns related to growth and stress responses, potentially extending the benefits of hybrid vigor beyond the F1 generation [39,40].Additionally, hybridization can catalyze the development of more robust reproductive barriers, ultimately culminating in population isolation.This is achieved through the creation of sterile or necrotic hybrids, which hinder gene flow and foster genetic divergence between populations [37,38].To assess the functionality of genes impacted by hybridization, we employed Gene Ontology (GO) annotation analysis (Fig. 5, Fig. S6-S9, Additional file 2).Our findings indicate that hybridization between accessions from Mt. Amasa and MH populations may have an impact on defense response, photosynthesis, phosphorylation, as well as DNA and RNA metabolism.Notably, hybrids exhibited expression patterns similar to those of Mt.Amasa in the subset of differentially expressed genes that were under-expressed in Mt.Hermon (clusters 1, 4, 5, Fig. 4).The identified genes were linked to critical processes such as photosynthesis (specifically the dark reaction), stress response, DNA and RNA metabolism, and developmental processes.These terms suggest enhanced resilience and adaptability in hybrids, potentially offering increased stress tolerance and metabolic capabilities.For example, an increase in carbon fixation and photosynthesis [40] indicates potential improvements in growth and energy utilization [41], which are crucial for plant survival under diverse environmental conditions.
It is possible that the Mt.Amasa parental line can transmit these beneficial traits through hybridization (Fig. S7c,  d, Additional file 2).Conversely, the Mt.Hermon line primarily transmitted traits linked to the MF classification associated with nucleic binding and enzymatic activity.Terms such as tRNA binding (GO:0000049), methionine-tRNA ligase activity (GO:0004825), and nucleotide binding (GO:0000166) suggest improvements in protein synthesis and general metabolic efficiency [42].(Fig. S9b, Additional file 2).Morover, the F4 hybrid did not exhibit any reduced expression levels of advantageous traits when comparing differentially expressed genes against both parental lines (cluster 7).Additionally, the F4 hybrid displayed high expression levels in genes associated with photosynthesis (GO:0015979 and GO:0019684) and protein-chromophore linkage (GO:0018298) (cluster 6, Fig. S7e, Additional file 2), in hybrids compared to parental lines which suggest an increase in photosynthetic efficiency and light-utilization mechanisms.increased photosynthesis indicates that hybrids may achieve higher rates of carbon fixation and more effective energy conversion [43].Additionally, improved protein-chromophore interactions imply advanced capabilities in light sensing [44].These findings indicate that the F4 hybrid might demonstrate some vigorous traits associated with photosynthesis.However, the phenotypic examinations conducted during the study did not detect differences in chlorophyll content between the F4 hybrids and parental lines (Fig. S2).The lack of observable differences in chlorophyll content does not rule out the possibility of other differences in photosynthetic processes in the F4 hybrids, suggesting that further detailed examinations are necessary to uncover this potential.

Conclusions
To sum up, intraspecific hybridization within natural plant populations, particularly in wild emmer wheat, has been a relatively understudied area.This study delves into the impact of hybridization on methylation alterations, transcriptome-wide changes, differentially expressed genes, and associated pathways.We observed a decrease in methylation levels in hybrids, although we couldn't directly link this decrease to the observed gene expression patterns.Additionally, we identified notable effects on DNA and RNA metabolism, photosynthesis, phosphorylation and developmental processes.Notably, most differentially expressed genes related to these traits were predominantly inherited from one parental lineage.Future investigations should focus on understanding how variations in gene expression between hybrids and their parental lines contribute to the adaptation of the nascent hybrid to diverse environmental conditions.

Plant material
Wild emmer wheat accessions from two marginal and natural populations (Mt.Hermon and Mt.Amasa) were chosen for hybridization.These populations represent the most northern population (Mt.Hermon, Coordinates: 32.7461°N, 35.0258°E) and the most southern (Mt.Amasa, Coordinates: 31.3525°N,35.1103°E ).Previous work in our lab found a population-specific genetic and epigenetic profile in both populations [11].Accessions from both populations were collected, hybridized, and kindly provided by Dr. Sergei Volis.Reciprocal hybridization was performed by crossing one plant from Hermon and one from Amasa to create the F1 generation.The hybrids underwent selfing to produce subsequent generations, including the F2, F3, and F4 generations ( For parental lines and F2-F4 example see Fig. S10, Additional file 2).There were two maternal lineages: Mt.Hermon's maternal lineage and Mt.Amasa's maternal lineage.Plants were grown in a greenhouse under common garden conditions at Ben-Gurion University between October 2018 to April 2018.All plants throughout all experiments were grown in a 4-liter pot, one for each.The validations of hybrid in the reciprocal crosses were tested by typical AFLP DNA markers (data not shown).Leaf samples for RNA and DNA extraction were collected between the 4th and the 6th -week post-germination.All hybrid plants used in this study were taken from the Hermon maternal lineage.

DNA extraction and methylation level assessment
Twelve plants were used to assess differences in methylation between parental lines and F4 hybrids.For each group of the parental lines, Mt.Hermon and Mt.Amasa, and the F4 hybrids, we conducted four biological replicates.DNA was extracted using a DNeasy Plant MINI Kit (Qiagen).DNA quality and concentrations were determined on a 1% gel agarose and by NanoDrop®.
The methylation level was assessed using MSAP (Methylation-Sensitive Amplified Polymorphism).MSAP is a modification of AFLP.It involves two isoschizomers, HpaII and MspI [11,45,46].Both enzymes cut unmethylated CCGG sites.The genomic restriction fragments are then amplified using two PCR reactions.
HpaII and MspI differ in their sensitivity to the external or internal cytosine methylation state at the restriction stage.HpaII will cleave in cases of external cytosine hemimethylation (only one strand is methylated).MspI cleaves when the internal cytosine is methylated.MSAP creates a pattern of monomorphic bands between the HpaII and MspI isoschizomers.If both isoschizomers digest the DNA templates (from the same DNA sample), it indicates that the CCGG site is unmethylated, while polymorphic bands indicate methylated sites.After the restriction stage, fragments were ligated to adaptor DNA, which includes an overhang complementary to the overhang produced by the restriction enzyme samples.Samples were then amplified twice: The first PCR amplification was a non-selective amplification using primers complementary to the adaptor sequence.The second PCR amplification was a selective amplification using the same primers from the non-selective amplification but with the addition of three random nucleotides.The methylation level of an individual was measured as the number of polymorphic bands between the MspI and HpaII MSAP reactions in the same individual out of the total number of MSAP bands.Primer sequences used in this study are shown in Additional File 1.
Selective amplification products were electrophoresed in a 3730xl DNA analyzer (Applied Biosystems) and analyzed using GeneMapper v6.0 (Applied Biosystems).Results from GeneMapper were transferred to an Excel table that summarized the presence (1) / absence (0) of bands at every site in all samples.The methylation level was calculated manually as the number of polymorphic bands between HpaII and MspI patterns divided by the total number of bands (loci) in the two patterns.

RNA extraction and sequencing
Overall, nine plant samples were selected for RNA-Seq analysis; three from each one of the parental lines, Mt.Hermon and Mt.Amasa, and three from the F4 hybrid.RNA was extracted using a ZR Plant RNA mini-prep kit (Zymo Research, Irvine, USA).Samples were sent to the Technion genome center (Haifa, Israel) for sequencing.RNA quality was assessed using Agilent Bioanalyzer.RNA Libraries were prepared using Illumina TruSeq RNA Library Preparation Kit v2.(Illumina).RNA sequencing was performed on two lanes on the Illumine Hiseq2500 machine at a read length of 50 bp SR.Quality control was evaluated using FASTQC v0.11.5 [47].RNA-seq reads were aligned against wild emmer wheat reference genome WEWSeq 1.0 acc.Zavitan [18] using Salmon [48].Tximport [49] was used to aggregate transcript-level counts and abundances into gene-level counts and abundances before normalization and differential expression analysis.

Differential gene expression analysis
Normalization and differential gene expression analysis were conducted using the 'DESeq2' R package [50].Differential gene expression pairwise comparison (e.g., F4 Vs.MH group) was performed using a Wald test (adjusted p-value > 0.05).Genes with fold-change > 2 were considered viable for further analysis.Differentially expressed gene counts were transformed using rlog transformation.Rlog transformation transforms the normalized counts to the log2 scale to minimize differences between samples with low and high counts.

SNPs and variant calling
Variant calling was performed to observe if there is a possible effect of Single Nucleotide Polymorphism (SNP) on the observed DEGs.Cleaned reads were mapped using the STAR program [51].All three groups (parental and F4 hybrids) were mapped to the wild emmer wheat reference genome WEWSeq 1.0 acc.Zavitan, After filtration (described in the following paragraphs) F4 hybrids samples were matched against each parental group in order to identify actual SNPs found in our samples [18].SNPs were identified using the Genome Analysis Toolkit (GATK, version 4.0.5.0) [52] recommended pipeline, with hard SNPs filtering as an exception.SNP variant calling was done using HaplotypeCaller and GenotypeGVCFs packages of GATK.
As wild emmer wheat is not liable for the GATKs VQSR filter tool, high-strength filtering based on GATKrecommended parameters was done.Each parameter threshold was determined following data distribution and used stricter thresholds than GATK-recommended thresholds.Credible variants were defined as variants that satisfied the following parameters: quality depth (QD) < 2.0, FisherStrand (FS) > 60.0, RMSMappingQuality (MQ) < 40.0, and ReadPosRankSum < -8.0.
Filtered SNPs were annotated using SNPeff [53].SNPeff assigns SNPs to their respective gene and predicts the possible effect onset genes (e.g., change in amino acids) [53].SNPs were then uploaded to R for further analysis.Additional filtration was done using the GQ (Genotype quality) parameter.SNPs with GQ higher than 15 across all samples were used for further analysis.SNP's homo/ heterozygosity for each sample was determined using AB (allelic balance) parameter.SNPs with AB between 0.2 and 0.8 were considered heterozygotes.Finally, we examined any correlation between SNPs' appearance in differentially expressed genes and the expression level in each group (parental and hybrids).For example, SNPs in genes overexpressed in one parental group could only be found in that set group but not in hybrids or the other parental group or vice versa, these SNPs could only be found in hybrids and the other parental group.

Gene functional classification and pathway analysis
Genes were annotated using matched orthologue genes using Ensemblplants Biomart [54] and BLASTn (E-value cutoff of 10 − 10 , Additional file 3) against bread wheat (Triticum aestivum), Arabidopsis (Arabidopsis thaliana), and rice (Oryza sativa Japonica).AgriGO v2 [21] was used to perform Gene Ontology (GO) singular enrichment analysis (SEA) for all differentially expressed genes.GO is a functional tool that classifies and characterizes information on an attribute of gene products in distinct, not overlapping, groups [21].GO analysis takes a group of genes associated with a specific criterion (e.g., similar gene expression).It identifies biological processes, cellular components, or molecular functions overrepresented in that group compared to a background population from which the query list is derived.Differentially expressed genes (DEGs) were used as gene groups while the wild emmer wheat reference genome WEWSeq 1.0 acc.Zavitan [18] was used as the background population.Reduction and Visualization GO analyses were performed using REVIGO (http://revigo.irb.hr/).REVIGO removes redundant GO terms using SimRel semantic similarity measure (the analysis allowed for a medium level of similarity, set at 0.7.) [55].

Statistical analysis
Pairwise comparison between two specific groups (for example, F4 to MH group) was performed using a Wald test (adjusted p-value > 0.05).
To observe the overall genetic distance between and within sample groups, we used Principal component analysis (PCA) and heatmap.
PCA observed the overall genetic distance between samples.It is a technique used to highlight variation in data and flash out strong patterns in multi-dimensional data sets such as transcriptomic data Fields [56].The output of a PCA transformation can be used to make data easy to explore and visualize.PCA projects data onto a two-dimensional (or three-dimensional in some cases) plane so that they spread out in the two directions that explain most of the variance between samples [57].The first principal component (PC1) explains the most significant possible variance in the data (e.g., 60% of the variance between control and tested samples is explained by the genes defining PC1).PC2 explains the second most significant portion of variance, and so on.Traditionally, the first few PCs are used for visualization since they capture most variation from the original data set [58].
PCA plot was conducted using log-transformed data.PCA graph was plotted using the ggplot2 package in R [59].
Transformed DEGs counts were scaled using Z-score (standard score) To perform K-mean cluster analysis and heatmaps of differentially expressed genes.Scaling was performed to identify clusters of genes with similar expression profiles rather than similar expression levels.
A total samples expression level heatmap was constructed using the 'pheatmap' package in R [60] Samples were sorted by hierarchical clustering using the default clustering provided by the pheatmap package.
DEGs were clustered according to their expression level across all samples using K-mean.The k-mean cluster uses an a priori number of clusters to split the data accordingly.The number of clusters (the number of Ks) was determined by applying Gap statistics [61] using the clusGap.
function in R [20].K-means clustering was performed according to the number cluster using the K-means base function in R. Clusters were then subjected to gene ontology analysis using AgriGO.

Fig. 2
Fig. 2 Heatmap of Euclidian distance between samples.Samples (F4 hybrids, Mt.Amasa (MA), and Mt.Hermon (MH) parental lines) from each group were sorted by hierarchical clustering (top).Color corresponds with similarity.Dark blue indicates high similarity, while light blue indicates low similarity

Fig. 5
Fig. 5 Significantly enriched GO categories were projected onto a two-dimensional semantic space using REVIGO.Color intensity reflects the significance of the enrichment test, with dark colors corresponding to lower P values and white to P closer to 0.05.Circle size indicates the frequency of the GO term in the underlying GOA database (bubbles of more general terms are larger).Revigo assigns names to bubbles representing terms with low dispensability value, meaning non-redundant terms concerning semantically close terms.A. BP in cluster 1, Under expressed in Mt.Hermon.B. BP in cluster 8, Overexpressed in Mt.Amasa.C. MF in cluster 6, Under expressed in both parental lines.D. MF in cluster 5, Under expressed in Mt.Hermon